Designing and Implementation of "regional Crawler" as a New Strategy for Crawling the Web
نویسندگان
چکیده
By the rapid growth of the World Wide Web, the significance and popularity of search engines are increasing day by day. However, today web crawlers are unable to update their search engine indexes concurrent to the growth in the information available on the web. This sometimes causes users to be unable to search on recent or updated information. Regional Crawler that we are proposing in this paper, improves the problem of updating and finding new pages to some extent by gathering users’ common needs and interests in a certain domain, which can be as small as a LAN in a department of a university or as huge as a country. In this paper, we introduce the design of the Regional Crawler architecture and discuss its application in search engines.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملA Novel Method for Crawler in Domain-specific Search
A focused crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a domain-specific topic. Rather than downloading all accessible Web pages, a focused crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. In this paper,...
متن کاملDistributed Web Crawling Using Network Coordinates
In this report we will outline the relevant background research, the design, the implementation and the evaluation of a distributed web crawler. Our system is innovative in that it assigns Euclidean coordinates to crawlers and web servers such that the distances in the space give an accurate prediction of download times. We will demonstrate that our method gives the crawler the ability to adapt...
متن کاملFocused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies
Compared to the standard web search engines, focused crawlers yield good recall as well as good precision by restricting themselves to a limited domain. In this paper, we do not introduce another focused crawler, but we introduce a generic framework for focused crawling consisting of two major components: (1) Specification of the user interest and measuring the resulting relevance of a given we...
متن کاملCollaborative Web Crawler over High-speed Research Network
This paper proposes an idea for constructing a distributed web crawler by utilizing existing high-speed research networks. This is an initial effort of the Web Language Engineering (WLE) project which investigates techniques in processing the languages found in published web documents. In this paper, we focus on designing a geographically distributed web crawler. Multiple crawlers work collabor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004